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HARRIS RECURRENCE OF METROPOLIS- WITHIN-GIBBS AND 
TRANS-DIMENSIONAL MARKOV CHAINS 

By Gareth O. Roberts and Jeffrey S. Rosenthal 
Lancaster University and University of Toronto 

A ^-irreducible and aperiodic Markov chain with stationary prob- 
ability distribution will converge to its stationary distribution from 
almost all starting points. The property of Harris recurrence allows us 
to replace "almost all" by "all," which is potentially important when 
running Markov chain Monte Carlo algorithms. Full-dimensional 
Metropolis-Hastings algorithms are known to be Harris recurrent. 
In this paper, we consider conditions under which Metropolis-within- 
Gibbs and trans-dimensional Markov chains are or are not Harris 
recurrent. We present a simple but natural two-dimensional counter- 
example showing how Harris recurrence can fail, and also a variety of 
positive results which guarantee Harris recurrence. We also present 
some open problems. We close with a discussion of the practical im- 
plications for MCMC algorithms. 

1. Introduction. Harris recurrence is a concept introduced fifty years 
ago by Harris [8]. More recently, connections between Harris recurrence and 
Markov chain Monte Carlo (MCMC) algorithms were investigated by Tier- 
ney [25] and Chan and Geyer [3]. In this paper, we re-examine Harris recur- 
rence of various MCMC algorithms in a more general context. 

Markov chains with stationary distributions are the basis of MCMC algo- 
rithms. For the algorithm to be valid, it is crucial that the chain converges to 
stationarity in distribution. If the state space is countable and the Markov 
chain is aperiodic and also (classically) irreducible (i.e., has positive proba- 
bility of reaching any state from any other state) , then it is well known that 
convergence to stationarity is guaranteed from all starting states (see, e.g., 
[1, 10, 22]). 

On the other hand, classical irreducibility is unachievable when the state 
space X is uncountable. A weaker property is 4>- irreducibility [i.e., having 
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positive probability of reaching every subset A with 4>{A) > from every 
state x € X, for some nonzero measure $>(•)]■ It is known that a (^-irreducible, 
aperiodic Markov chain with stationary probability distribution ir(-) must 
still converge to 7r(-) from 7r-almost every starting point (see, e.g., [13, 15, 
23]). However, if a chain is ^-irreducible but not classically irreducible, then 
it is indeed possible to have a null set of states from which convergence does 
not occur. 

Tierney [25] and Chan and Geyer [3] note that this null set of points 
from which convergence fails could cause practical problems for MCMC 
algorithms if the user happens to choose an initial state in this null set. 
Thus, understanding the nature of this null set is important for applications 
of MCMC, as well as theoretically. Chan and Geyer [3] refer to this null set as 
a "measure-theoretic pathology." However, we shall see herein that the null 
set can arise quite naturally, on both discrete and continuous state spaces, 
including for a simple two-dimensional Metropolis-within-Gibbs algorithm 
with continuous densities. 

This paper is structured as follows. Section 2 presents some background 
about Markov chains and Harris recurrence and Theorem 6 proves a num- 
ber of equivalences of Harris recurrence. Section 3 discusses full-dimensional 
Metropolis-Hastings algorithms and Section 4 then discusses Metropolis- 
within-Gibbs algorithms. Example 9 demonstrates that a simple two-dimen- 
sional Metropolis-within-Gibbs algorithm with continuous target and pro- 
posal densities, although irreducible and aperiodic, can still fail to be Harris 
recurrent or to converge to stationarity from all starting points. Sections 4 
and 5 prove various positive results which guarantee Harris recurrence for 
Metropolis-within-Gibbs algorithms under various conditions and Section 6 
does the same for trans-dimensional Markov chains. 

2. Markov chains and Harris recurrence. Consider a Markov chain {X n } 
with transition probabilities P(x, ■), on a state space X with cr-algebra T . 
Let P n (x, ■) be the n-step transition kernel and for A £ let ta = inf{n > 
1 : X n € A} be the first return time to A, with ta = oo if the chain never 
returns to A. 

Recall that a Markov chain is ^-irreducible if there exists a nonzero a- 
finite measure ifj(-) on (X ,J-) such that ~P[ta < oo\Xq = x] > for all x G X 
and all A € T with ip(A) > 0. The probability distribution ir(-) on (X,^) is 
stationary for the chain if J x n(dx)P(x, A) = tt(A) for all A € T . 

The period of a <f>- irreducible chain with stationary distribution ir(-) is the 
largest DgN (the set of all positive integers) for which there exist disjoint 
subsets X\,X2, . . . , Xjj £ T with Tr(Xi) > 0, such that P(x, Xi + \) = 1 for all 
x e Xi (1 < % < D - 1) and P(x, X x ) = 1 for all x G X D . If D = 1, then the 
chain is aperiodic. 

In terms of these definitions, we have the following classical result, as in, 
for example, [25], page 1758 or [23]. (See also [15] and [13].) 
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Proposition 1. Consider a <f> -irreducible, aperiodic Markov chain with 
stationary probability distribution vr(-). Let G be the set of x € X such that 
lim^oo ||P n (x, •) - tt(-)|| = 0. Then vr(G) = 1. 

We also note that aperiodicity is not essential in Proposition 1: 

Proposition 2. Consider a ^-irreducible Markov chain with stationary 
probability distribution tt(-) and period D > 1. Let G be the set of ' xdX such 
that lim^oo 11(1/1}) Y%Li P nD+r (x, •) - vr(-)|| = 0. Then tt(G) = 1. 

Proof. If D = 1, then this reduces to Proposition 1 above. If D > 1, 
then let X\, A^, . . . , Xjj be as in the above definition of period. Then P D 
is (j)- irreducible and aperiodic when restricted to each Xi, with station- 
ary distribution vrj(-) such that ir(-) = (l/D)J2^ > =i 7r i(')- It follows from 
Proposition 1 that for 7T£>-a.e. x £ Xd and for any 1 < r < D, we have 
linin.^oo \\P nD+r (x, •) — vr r (-)|| = 0. Hence, for 7Tj-a.e. x G Xi, we have 
lim rt ^oo \\P nD+r+D ~ l (x, •) — vr r (-)|| = 0. The result follows by averaging over 
1 < r < D and using the triangle inequality. □ 

The above conclusions still allow for the possibility of a null set G c from 
which convergence fails. This null set can indeed arise, even for simple ex- 
amples, on both discrete and continuous state spaces: 

Example 3 ([5, 17]). Let X = {1,2, . . .}. Let P(1,{1}) = 1, and for 
x > 2, P(x,{l}) = 1/x 2 and P(x,{x + 1}) = 1 - (1/x 2 ). This chain has sta- 
tionary distribution 7r(-) = 5±(-) and it is ^-irreducible (with respect to ir) 
and aperiodic. On the other hand, if Xq = x > 2, then P[X n = x + n for 
all n] = UjLxi 1 ~ 0-1 f)) > so that ||P n (:zv) - vr(-)|| A 0. Hence, con- 
vergence holds only from the set G = {1}, but fails to hold from the set 
G c = {2, 3, 4, . . .}. Of course, this example is not irreducible in the classical 
sense since no state x > 2 is reachable from the state 1. However, it is still 
indecomposable (see, e.g., [21]). 

Example 4 (Continuous state space version). Let X = [0, 1]. Define the 
transition kernel P(x,-) as follows. If x = 1/m for some positive integer m, 
then P(x,-) =x 2 Uniform[0, l] + (l-x 2 )(5 1 /( m+1 )(-). For all other x, P(x,-) = 
Uniform[0, 1]. Then the chain has stationary distribution ir(-) = Uniform[0, 1] 
and it is (^-irreducible (with respect to ir) and aperiodic. On the other hand, 
if Xq = 1/m for some m > 2, then P[X n = l/(m + n) for all n] = rj^ m (l — 
(1/j 2 )) > so that \\P n (x,-) — vr(-)|| -/-> 0. Hence, convergence fails to hold 
from the set G c = {1/2, 1/3, 1/4, . . .}. 
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To rectify the problems of the null set G , we consider Harris recurrence, 
a concept developed by Harris [8] and introduced to statisticians in the 
important works of Tierney [25] and Chan and Geyer [3] (see also [6]). 

Definition 5. A ^-irreducible Markov chain with stationary distribu- 
tion 7r(-) is Harris recurrent if for all A C X with tt(A) > and all x £ X , 
we have P(ta < oolXo = x) = 1. 

We now prove a number of equivalences. 

Theorem 6. For a (ft- irreducible Markov chain with stationary proba- 
bility distribution ir(-) and period D > 1, the following are equivalent: 

(i) The chain is Harris recurrent. 

(ii) For all AC X with tt(A) > and all x £ X, we have ~P(X n 6 A i.o.\ 
Xq = x) = 1. (Here i.o. means 11 infinitely often," i.e., for infinitely many 
different times n.) 

(iii) For all x eX, lim™ 11(1/1?) £?=i P nD+r (x, ■) - tt(-) || = 0. 

(iv) For all x £ X , P[tg < oo|Xo = x] = 1, where G is as in Proposi- 
tion 2. 

(v) For all x^X and all AeT with ir(A) = 1, P[ta < oo|A" =x] = 1. 

(vi) For all x e X and all A e F with ir(A) = 0, P[X n G A /or a// n| 
X = x] =0. 

Proof, (ii) => (i); (i) ==>■ (v); (v) => (iv); and (v) (vi): Imme- 
diate. 

(i) (ii): Suppose to the contrary that (ii) does not hold. Then there 
is some A C\ X with tt(A) > 0, some x £ X and some N £ N such that 
P(X n ^ A Vn > iVlXo = x) > 0. Integrating over choices of y = Xn, this 
implies there is some y £ X with P(ta = oo|Xo = y) > 0, contradicting (i). 

(iv) ==> (iii): Prom Proposition 2, once the chain reaches G, it will con- 
verge. The convergence in (iii) then follows. More formally, conditional on 
the first hitting time tq and the corresponding chain value X TQ , the chain 
will converge in total variation distance as in (iii). Statement (iii) then fol- 
lows by integrating over all choices of tq and X TQ and using the triangle 
inequality for total variation distance. 

(iii) ==>- (i): If <f>(A) > [where </>(•) is an irreducibility measure], then we 
must have tt(A) > (see, e.g., Lemma 3 of [23]), so by (iii) we have that for all 
x £ X, Y£=! P nD+r (x, A) -> Dtt(A) > and, in particular, J2n=i pn ( x , A ) = 
oo. It then follows from Theorem 9.0.1 of [13], using their definition of re- 
currence on page 182, that we can find an absorbing subset H C X such 
that the chain restricted to H is Harris recurrent and tt(H) = 1. Then 
(l/D)J2?=iP nD+r {x,H) -> 1, so P n (x,H) -> 1. Hence, the chain will even- 
tually reach H with probability 1. Since the chain restricted to H is Harris 
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recurrent, it will then eventually reach any A with tt{A) = 1, with probabil- 
ity 1, thus establishing (i). □ 

For completeness, we note another method for verifying Harris recurrence 
(although we do not use it here). Given a Markov chain with stationary 
distribution 7r(-), a subset C 6 T is small if vr(C) > and there is an e > 
and a probability measure on (A',^ r ) such that P(x,A) > e^(yl) for all 
AdT and x € C. It easily follows that we must have v -C vr. 

Proposition 7 ([13]). // a Markov chain with stationary distribution 
ir(-) /ias a small set C with the property that P[X n € C i.o. \Xq = x] = 1 /or 
all x £ X , then the chain is Harris recurrent. 

PROOF. Using the splitting technique (see, e.g., [15]), each time it is 
in C, we can regard the chain as proceeding by moving according to z/(-) 
with probability e. If the chain returns to C infinitely often, then with 
probability 1 it will eventually move according to f(-). Since v <C vr, this 
means it will eventually leave any set of null 7r-measure. Hence, the result 
follows from Theorem 6(vi). □ 

Various drift conditions can be used to establish that P[^ n £ C i.o. \Xo = 
x) = 1 for all x S X and thus establish Harris recurrence. For example, it fol- 
lows from [13], Theorem 13.0.1, that for (^-irreducible chains, it suffices that 
there exists a measurable function V : X — > (0, oo) such that E[V(-Xi)|-Xo = 
x] < V(x) — 1 + 61c (x) for some 6 < oo. Alternatively, it follows from [13], 
Theorem 8.4.3 (see also [6]) that for c/3-irreducible chains, Harris recur- 
rence follows if there exists a measurable function V : X — ► (0, oo) such that 
V _1 ((0,a]) is small for all a > and such that E[Fpfi)|X =z] < V(x) for 
aHxeX\C. 

Remark. We note that the null sets related to Harris recurrence are of 
an "extreme" kind in the sense that the chain may fail entirely to converge 
from the null set. Less radically, one could consider chains which converge 
from everywhere but which have a slower qualitative rate of convergence 
from some null set. For example, it should be possible to construct Markov 
chains which are Harris recurrent and geometrically ergodic but which con- 
verge at a subgeometric rate from a certain null set of initial states; or, 
chains which are polynomially ergodic at a particular polynomial rate a but 
which fail to converge at the polynomial rate a from some null set; or, chains 
which are geometrically ergodic but which fail to converge from one null set, 
converge polynomially from another null set, converge subpolynomially from 
a third null set, and so on. In this context, Harris recurrence can be seen 
as one in a series of properties ensuring that "things are not worse when 
starting from a null set than when starting from anywhere else." 
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3. Full-dimensional Metropolis Hastings algorithms. Let X be some state 
space with cr-algebra T. Let ir(-) be a probability distribution on (X,T) 
having unnormalized density function / : X — > (0, oo) with respect to some 
reference measure v(-) so that J x f(x)v(dx) < oo and 

-(A) — ^/(^W AGj r 

« A) -f x f{x)u(dxy Aer - 

Note that we assume / > on X or, equivalently, that X is defined to be the 
support of /. To avoid trivialities, we assume that ir(-) is not concentrated 
at a single state, that is, that ir{x} < 1 for all x £ X. 

Let q : X x X — ► [0, oo) be any jointly measurable function such that / q(x, 
y)u{dy) = 1 for all x EX. Define the Markov kernel Q(x,-) by Q(x,A) = 
J A q(x,y)v(dy) for x £ X and let 

f(y)q(y,x) 



a(x, y) = min 



1. 



x, y G X 



f(x)q(x,y)_ 

[with a(x, y) = 1 if f(x)q(x, y) = 0]. 

The full- dimensional Metropolis-Hastings algorithm [9, 12, 25] proceeds 
as follows. Given that the chain is in state X n at time n, it generates a 
"proposal state" Y n+ \ ~ Q(X n ,-). Then, with probability a(X n ,l^ + x), it 
"accepts" this proposal and sets X n+ i = Y n+ \; otherwise, with probability 
1 — a(X n ,Y n+ i), it "rejects" this proposal and sets X n+ \ = X n . It is easy to 
check that ir(-) is then stationary for the Markov chain {X n }. 

Clearly, any such Markov chain can be decomposed as 

P(x,A) = (l-r(x))M(x,A)+r(x)5 x {A), x G X, A<ZX, 

where S x (-) is a point-mass at x, r(x) = J q(x,y)[l — a(x,y)]u(dy) is the 
probability of rejection when starting at X n = x and M(x, •) is the kernel 
conditional on moving (i.e., on X n+ i ^ X n ). In particular, the probability 
distribution M(x, •) is absolutely continuous with respect to u(-) for all x € 
X. 

Regarding Harris recurrence, we have the following result, which was orig- 
inally proved by Tierney [25] using the theory of harmonic functions: 

Theorem 8 ([25]). Every ^-irreducible, full- dimensional Metropolis- 
Hastings algorithm is Harris recurrent. 

Proof. Since the chain is ^-irreducible and tt{x} < 1, we must have 
r(x) < 1 for all x G X. Suppose ir(A) = 1. Then vr(^ c ) = 0, and so, as we 
are assuming that / > throughout X, we also have u(A ) = 0. Hence, 
by absolute continuity, M(x,A ) = 0, that is, M(x,A) = 1. It follows that, 
if the chain is at x, then it will eventually move according to M(x,-), at 
which point it will necessarily move into A. The result then follows from 
Theorem 6(v). □ 



HARRIS RECURRENCE 



7 



4. Metropolis- within- Gibbs algorithms. We now define Metropolis-within- 
Gibbs Markov chains [12]. 

For simplicity, let X be an open subset of R rf with Borel cr-algebra J- and 
(unnormalized) target density / : X — > (0, oo) with J x f(x)\(dx) < oo [where 
A(-) is (i-dimensional Lebesgue measure]. For 1 < i < d, let qi : X x R — > [0, oo) 
be jointly measurable with qi(x, z) dz = 1 for all x € X (where dz is one- 
dimensional Lebesgue measure). 

Let Qi(x,-) be the Markov kernel on R rf which replaces the ith coordi- 
nate by a draw from the density qi(x,-), but leaves the other coordinates 
unchanged. That is, 

Qi{x,S it a,b) = [ qi(x,z)dz, 



where 

Si, a ,b = {y e X :yj = xj for j ^ i and a < y-j < 6}. 

To avoid technicalities and special cases, assume that Qi(x, X) > for all 
x £ X and 1 < i < d, and also that each qi is symmetrically positive in the 
sense that 

qi((xi,...,Xi-i,y,Xi+i,...,x d ),z) >0 

<^=^> 9i((a:i, • • .,Xi-i,z,x i+ i, . . .,x d ),y) > 0. 
For x, y G H d and 1 < i < d, let 

f(y)Qi{y,x) 



cti(x, y) = min 



1, 



f{x)qi{x,y)_ 

[with oti(x,y) = 1 if f(x)qi(x,y) = 0]. Let Pj be the kernel which proceeds 
as follows. Given X n , it generates a proposal Y n+ \ ~ Qi(X n , •). Then, with 
probability aj(X„,, 3^+i), it accepts this proposal and sets X n+ \ = Y n+ i; 
otherwise, with probability 1 — ai(X n ,Y n+ i), it rejects the proposal and sets 
Xn+l = X n . 

In terms of these definitions, the Metropolis-within-Gibbs Markov chain 
proceeds as follows. Random variables I\, I2, ■ ■ . taking values in {1,2,..., d} 
are chosen according to some scheme. (The two most common schemes are 
random-scan, where {I n } are i.i.d. uniform on {1,2, ... ,d}, and deterministic- 
scan, where I\ = 1, 12 = 2, . . . , Id = d, J^ +1 = 1, — ) Then for n = 0,1,2, ... , 
given X n , the chain generates X n+ \ ~ Pj n+1 (X n , •). It is straightforward to 
verify that this chain has stationary distribution ir(-) given by 

$ x f{x)\{dx) 

The above description defines Metropolis-within-Gibbs chains as we shall 
study them. We can now ask, under what conditions are such chains Harris 
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recurrent? One might think that a result similar to Theorem 8 holds for 
Metropolis-within-Gibbs algorithms, at least when the target and proposal 
densities are continuous. However, surprisingly, this is false: 

Example 9. We present a Metropolis-within-Gibbs algorithm on an 
open subset X C R 2 , with stationary distribution vr(-), with continuous tar- 
get and proposal densities, which is (^-irreducible and aperiodic, but which 
fails to converge in distribution to ir(-) from an uncountable number of dif- 
ferent starting points (having total 7r-measure zero, of course). 

Let X = {(xijXz) £ R 2 :xi > 1} and define the function f:X—* (0, oo) 
by f{x\,X2) = (e/2)exp(xi — |x2|e 2a;i ) (so that J x f = 1). Let Q\{x,-) and 
Q2(x,-) be symmetric unit normal proposals so that qi(x,z) = 
(27r)~ 1//2 exp(— (z — Xi) 2 /2) (i = 1,2). Then, clearly, /, q\ and q2 are pos- 
itive continuous functions; it follows that the chain is ^-irreducible where 
is Lebesgue measure. 

Consider the random-scan (say) Metropolis-within-Gibbs Markov chain 
corresponding to these choices. We shall prove that this chain is not Harris 
recurrent. Indeed, let S = {(xi,0) :x± > 1} be the part of the line {x2 = 0} 
which lies in X . We claim that if the chain starts at any initial state in S, 
then there is positive probability that it will drift off to x\ — > oo without 
ever updating X2, that is, without ever leaving S. Then, since tt(S) = 0, it 
follows that if Xq € S, the chain will fail to converge to ir(-). 

To establish the claim, consider first a Markov chain {M^} equivalent to 
just the first coordinate of X n , under just the kernel Pi (which proposes 
moves only in the x\ direction), restricted to the state space S. Now, on 
S, the density / is proportional to e xi . It follows that for any 5 > and 
x\ > 1, «i((xi,0),(xi - 5,0)) < e~ 5 , while a 1 ((x 1 ,0),(x 1 + 6,0)) = 1. That 
is, proposals to increase x\ will all be accepted, while a positive fraction of 
the proposals to decrease x\ will be rejected. It follows that on 5, the kernel 
Pi has positive drift. Hence, there exists c > such that for all x\ > 1, 



On the other hand, the density f{x\,X2) as a function of X2 alone (i.e., 
with x\ regarded as a constant) is proportional to exp(—\x2\e 2xi ). It follows 
that the probability of accepting a proposal in the X2 direction is equal to 
E[exp(— \Z\e 2xi )], where Z ~ N(0, 1), which is less than 



Now, since Xm2e _2cn < oo, it follows from the Borel-Cantelli lemma (e.g., 
[22], Theorem 3.4.2) that there is positive probability that all proposed 
moves in the X2 direction will be rejected. That is, for any x\ > 1, 



PfW^n > cn for all sufficiently large n| Wq = x±] > 0. 




P[X n G 5 for all n\X = {x 1 ,0)] > 0, 
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thus proving the claim. (We shall see in Corollary 18 below that the "prob- 
lem" with this example is that the one-dimensional integral of / over the 
line {x2 = 0} is infinite.) 

Remark. In the above example, it is also possible, if desired, to modify / 
to decrease to near the boundary {x\ = 1} in order to make / be continuous 
throughout R 2 (not just on X ) without affecting the result. 

To proceed, decompose P{ as Pi(x, •) = [1 — r(x)]Mi(x, •) + r(x)5 x (-) so 
that Mi(x,S) is the kernel corresponding to moving (i.e., both proposing 
and accepting) in the ith direction. 

Lemma 10. Let (ii,i2, ■ ■ ■ ,in) be any sequence of coordinate directions. 
Assume that each of the d coordinate directions appears at least once in 
the sequence «2 5 - •■ >*n)- Then M^M^ ■ ■ -Mi n is an absolutely continuous 
kernel, that is, if A £ J 7 with X(A) = 0, then (M il M i2 ■ ■ ■ M in )(x, A)=0 for 
all x G X . 

Proof. We shall compute a density for (M^M^ • •■Mi n )(x, •)• The re- 
sult then follows since every distribution having a density is absolutely con- 
tinuous. Let 



that is, J is the set of "last time the chain moved in direction i" for each 
coordinate i. (Thus, \J\ = d.) For 1 < m < n, let S m C R be any Borel subset 
so that S = S\ x • • • x Sd is an arbitrary measurable rectangle in R d . Then 
define subsets R m C R for 1 < m < n by letting R m = Si m if m G J, and 
R m = R otherwise. 

We then compute that 



It follows that the density of (M^M^ • ■ • Mi n )(x, ■) is given by the above 
formula, but with the integration over the variables {xj; j G J} omitted. 
Hence, (M^M^ • • • Mi n )(x, •) has a density and is thus absolutely continuous. 

□ 



J = {m : 1 < to <n,ij ^ i m for m < j <n} 



(M h M i2 ---M in )(x,S) 




qi 1 {xi,X2)a(x 1 ,X2)qi 2 (x2,x 3 )a(x2,x 3 ) x ••• 



X 



<?;„_i (x n -i,x n )a(x n -i, x n ) dx 1 dx 2 --- dx n . 



From the law of total probability, we therefore obtain the following: 
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Corollary 11. If A has Lebesgue measure 0, then P[X n G A\Xq = 
x o] < P[-Dn] where D n is the event that by time n, the chain has not yet 
moved in each coordinate direction. 

This allows us to prove the following: 

Theorem 12. Consider a ^-irreducible Metropolis-within-Gibbs Markov 
chain. Suppose that from any initial state x, with probability 1, the chain will 
eventually move at least once in each coordinate direction. Then the chain 
is Harris recurrent. 

Proof. The hypothesis implies that for all x G X, lim n _ >00 P[D n \ 
Xq = x]= 0. Now, let tt(A) = 0. Then since / > on X , we must also have 
X(A) = 0. Hence, by Corollary 11, we must have 

P\X n G A VralXo =x]< lim P\X n G A\X =x}< lim P\D n \X = x] = 0. 

n— >oo n^oo 

The result then follows from Theorem 6(vi). □ 

The classical Gibbs sampler (see, e.g., [4]) is a special case of Metropolis- 
within-Gibbs in which the proposal densities are chosen so that a(x,y) = 
1, that is, so that all proposed moves are accepted. Now, with either the 
deterministic-scan or systematic-scan Gibbs sampler variants, it is certainly 
true that with probability 1, moves are eventually proposed in all directions. 
So, since a(x,y) = 1, it also follows that with probability 1, the chain will 
eventually move in all directions. Hence, from Theorem 12, we immediately 
obtain the following: 

Corollary 13 ([25]). Every cf) -irreducible deterministic- or random- 
scan Gibbs sampler is Harris recurrent. 

5. Subchains of Metropolis-within-Gibbs algorithms. We now consider 
the extent to which Harris recurrence of the full chain can be "inherited" 
from Harris recurrence of various subchains. For a subset I = {i\,... ,i r } C 
{1, . . . ,n}, let be the Markov kernel which corresponds to the original 
Metropolis-within-Gibbs chain, that it is except conditional on never moving 
in any coordinate directions other than the coordinate directions i±, . . . ,i r . 
Call the collection of kernels pW, where |/| = d— 1, the "(d— l)-dimensional 
subchains." These subchains can fail to be ^-irreducible: 

Example 14. Suppose that 

X = {(a?i, x 2 ) G R 2 : 16 < x\ + x\ < 25} 
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(an annulus or "donut-shaped" state space) and that the proposal kernels 
Qi(x,-) simply replace xi by a draw from the Uniform [x$ — l,Xi + 1] dis- 
tribution. Then the full Metropolis-within-Gibbs chain is (^-irreducible, but 
the one-dimensional subchain along the line {x2 = 0} breaks up into two 
distinct noncommunicating intervals, (—5,-4) and (4,5), and is therefore 
not ^-irreducible. 

Harris recurrence is often defined solely for ^-irreducible chains (e.g., [13]). 
We generalize as follows. Call a chain piecewise Harris if the state space X 
can be partitioned into a disjoint union X = U Q eS % a where each X a is 
closed and the chain restricted to each X a is Harris recurrent. Of course, 
if the partition consists solely of a single X a , then the full chain is Harris 
recurrent. The following proposition says that the piecewise Harris property 
often suffices: 

Proposition 15. If a Markov chain is piecewise Harris and is also 
(f) -irreducible, then it is Harris recurrent. 

Proof. Let Xa be any nonempty element of the partition from the 
definition of piecewise Harris and let X* = X \ Xq. If <fi(X*) > 0, then, by (f>- 
irreducibility, for each x 6 X, there exists n = n{x) with P n (x, X*) > 0. Since 
Xp is closed, this implies that Xa is empty, contradicting our assumption. 
Hence, (j)(X*) = 0. Since <fi is nonzero, we must have (j){Xp) > 0. It then 
follows similarly that X* is empty, that is, that Xq = X . Thus, the partition 
contains just a single element, and so the chain is Harris recurrent. □ 

In terms of the piecewise Harris property, we have the following: 

Theorem 16. Consider a random-scan Metropolis-within-Gibbs chain, 
as above. Suppose that all the (d— 1)- dimensional subchains in every (d—1)- 
dimensional coordinate hyperplane are piecewise Harris. Then the full chain 
is piecewise Harris. (In particular, by Proposition 15, if the full chain is 
cj) -irreducible, then the full chain is Harris recurrent.) 

Proof. Consider any fixed initial state xq = (#0,1, • ■ • ,^o,d)- By Theo- 
rem 12, it suffices to show that, with probability 1, when starting at Xq = xq, 
the chain will eventually move in each coordinate direction. 

Suppose, to the contrary, that this is false and that there is positive prob- 
ability that the chain never moves in some direction, say (for notational 
simplicity) in direction d. Let H = {y € X :yj = xqj for j ^ d} be the hyper- 
plane corresponding to never moving in the dth direction. 
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Let I n be the direction of the proposed move of the full chain at time n 
and let A n = 1 if this move is accepted, otherwise let A n = 0. Let 

C m ,r = {w(£ H;P[I m = d,A m = 1\X =w]> 1/r}. 

That is, C mtr is the set of states in H which have probability > 1/r of 
changing the dth coordinate, m steps later, when moving according to the 
sub chain. 

By assumption, Qd(x, X) > and f(x) > for all x 6 X. This implies that 
the chain has positive probability, starting from any x £ H, of eventually 
moving in the dth direction, that is, of leaving H . Hence, 

oo 

(1) (J C m , r = H. 

m,r=l 

(In fact, it suffices to consider just m = 1, but we do not use that fact here.) 

Consider now the subchain pi 1 ' 2 '— restricted to the hyperplane H. 
Since this subchain is piecewise Harris, we must have xq € Hq for some closed 
subset Hq C H such that the subchain restricted to Hq is Harris recurrent 
with respect to some nonzero measure ipj(-)- From (1), there must exist some 
m,r G N with ipj(C m , tr ) > 0. It then follows from Theorem 6(h) that, with 
probability 1, C m)T is hit infinitely often by the subchain. In other words, 
conditional on the full chain never moving in the dth direction, it will enter 
C mtr infinitely often. 

However, each time the full chain visits C m ^ r , it has probability > 1/r of 
moving in the dth direction m steps later. It follows that, with probability 1, 
the full chain will eventually jump in the dth direction and hence leave H . 
This contradicts our assumption that the chain has positive probability of 
never leaving H. □ 

Unfortunately, Theorem 16 still requires that we verify Harris recurrence 
of various subchains, which may be difficult. However, if the subchains of 
all dimensions all have stationary distributions, then no Harris recurrence 
needs to be checked as we see in the following: 

Corollary 17. Consider a random-scan Metropolis-within-Gibbs chain 
as above. Suppose that every r-dimensional subchain in every r- dimensional 
coordinate hyperplane has a each have stationary probability distribution, 
for all 1 < r < d. Then the full chain is piecewise Harris (as are all the 
subchains). 

Proof. Let T r be the statement that all the subchains of dimension 
< r are piecewise Harris. Then T\ holds by Theorem 8. Furthermore, from 
Theorem 16, for any r < d, if T r holds, then T r+ i must hold. Hence, the 
result follows by induction. □ 
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We then have the following: 

Corollary 18. Consider a random-scan Metropolis-within-Gibbs Markov 
chain. Suppose that the target density f has the property that its r- dimensional 
integral has finite Lebesgue integral, over every r-dimensional coordinate hy- 
perplane of X , for all 1 < r < d. Then the full chain is piecewise Harris (as 
are all the subchains). 

Proof. In this case, / is an (unnormalized) density for a stationary 
probability distribution of each subchain on each hyperplane. (Note that 
the Lebesgue integral of / over the hyperplane must be positive since we are 
assuming that X is open and that / > on X .) Hence, the result follows 
from Corollary 17. □ 

In the counterexample of Example 9, the one-dimensional xi-chain fails 
to have a stationary distribution along the line {x2 = 0} since the integral 
of / along the line {x2 = 0} is infinite. 

A result similar to Corollary 18 appears as Theorem 1 of [3] under the 
assumption that each subchain is (^-irreducible (which, as we have seen in 
Example 14, can easily fail to hold): 

Corollary 19 ([3]). Consider a random-scan Metropolis-within-Gibbs 
Markov chain. Suppose that the target density f has the property that its 
r-dimensional integral has finite Lebesgue integral over every r-dimensional 
coordinate hyperplane of X , for all 1 < r < d. If the full chain and all the 
subchains are <f) -irreducible, then the full chain is Harris recurrent. 

6. Trans-dimensional MCMC algorithms. In certain statistical setups 
(e.g., autoregressive models), the number of parameters is not fixed in ad- 
vance. This means that the state space of possible parameter values is a (dis- 
joint) union of spaces of different dimensions. Exploring such state spaces 
through MCMC algorithms requires the introduction of trans-dimensional 
MCMC. Trans-dimensional MCMC algorithms first appeared in [14] and 
[16]; their introduction into modern statistical practice (under the name 
"reversible jump") is due to the influential paper of Green [7] (see also [26]). 

Suppose that for each m G M. C N, where \M\ > 1 (and usually \M \ = 
oo), we have a space X m of dimension d m , that is, X m is an open subset of 
R dm . We combine these different spaces into a single state space X by setting 
X — Um=i({ m ) x X m ). Furthermore, suppose that on each X m , we have an 
unnormalized target density function f m :X m — > (0, oo) with f Xm fm < oo. 
We then combine that into a single probability distribution 7r(-) on X by 
choosing some p: M. — ► (0, 1) with J2 m eMP( m ) = ^> setting 



(2) 



7r(m, A) = p(m) 



J A fm{x)X m {dx) 
fx fm(x)Xm(dx) 
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and using linearity; in (2), A m (-) is Lebesgue measure on R rfm . 

Trans-dimensional chains can proceed in a variety of ways [2, 7]. We first 
consider a general class which we call full- dimensional trans- dimensional 
MCMC. Fix some < a < 1 and some irreducible kernel R(m, •) on A4 such 
that R(m,m') > if and only if R(m',m) > 0. Then at each iteration, with 
probability a, the chain proposes a "between-model move" which replaces 
(m,x) by (m',x'), where m! ~ R(m, •) and where x' € X m i is generated by 
some complicated dimension-matching scheme [7]. This proposal is then ac- 
cepted or rejected according to the usual Metropolis-Hastings scheme, ex- 
cept that now the formula for a[(m,x),(m' ,x')] is more complicated and 
involves a Jacobian of the transformation used to generate x' . Otherwise, 
with probability 1 — a, the chain leaves m fixed but proposes a "within- 
model move," that is, to replace x by x' € X m , using a full-dimensional 
Metropolis-Hastings proposal on X m . 

What about Harris recurrence? We have the following: 

Theorem 20. Consider a full- dimensional trans- dimensional MCMC 
algorithm as above. Let D be the event that no within-model move is ever 
accepted. Suppose that P[D\Xq = (m,x)] = for all (m,x) 6 X . Then the 
algorithm is Harris recurrent. 

Proof. The proof is very similar to that of Theorem 8. Since P[D|A"o = 
(m, x)] = 0, the chain must eventually accept a within-model move. But since 
the within-model proposal distributions are full-dimensional, the probability 
of remaining in any set of 7r-measure after such a move is equal to 0. The 
result thus follows from Theorem 6(vi). □ 

Remark. Theorem 20 remains true regardless of the details of how the 
between- model moves are implemented, provided only that they preserve 
the stationarity of 7r(-). 

Remark. Even without verifying the hypothesis of Theorem 20, it is 
true that once a full-dimensional trans-dimensional MCMC algorithm makes 
at least one within-model move, then, since the within-model moves are full- 
dimensional, with probability 1, the chain will move to the set G of Propo- 
sition 2 and hence will then converge. The issue in Theorem 20 is whether 
or not such a within- model move will eventually occur with probability 1. 

Now, Theorem 20 allows for the possibility that from a null set, the model 
numbers m n might have positive probability of converging to +oo without 
ever accepting a within-model move. This seems quite plausible. On the 
other hand, if the {m n } process is recurrent, the situation is less clear due 
to the complicated details of the (m, x) — > (m', x') mapping corresponding to 
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the between-model moves. Conditional on never accepting a within-model 
move, even if the chain returns to X\ (say) infinitely often, it might poten- 
tially be at "worse and worse" points within X\ each time it returned and 
thus have an ever decreasing probability of accepting within-model moves. 
So, even a chain in which {m n } is recurrent could conceivably fail to be 
Harris recurrent. We state this as an open problem: 

Open problem 1. Does there exist a ^-irreducible full-dimensional 
trans-dimensional MCMC algorithm as above, for which P[m n = 1 i.o. \Xo = 
(m,x)\ = 1 for all m G M and x G X m , which fails to be Harris recurrent? 

More generally, a trans-dimensional MCMC might not be full-dimensional. 
That is, the within-model moves might themselves be of Metropolis-within- 
Gibbs form. To model this, we proceed as in [2]. We replace X m by X m = 
X m x [0, 1] x [0, 1] x • • • , with stationary distribution 7r m = 7r m x Uniform[0, 1] x 
Uniform[0, 1] x • • • . We then let hij : Xi — > Xj be deterministic functions de- 
fined whenever R(i,j) > 0, and such that hji = (hij) . The between-model 
moves are specified by requiring that when the algorithm proposes changing 
m to m', it simultaneously proposes changing x to h mm i(x). 

A special case is when each hij function is simply the identity function, 
which is plausible if X m = [0, l] dm for each m£ M. More generally, we con- 
sider coordinate-preserving trans- dimensional MCMC in which each hij can 
be decomposed as 

(3) h tJ = h^xh^x-.., 

(t) 

where each : R — > R and its inverse are differentiable functions acting 
solely on the £th coordinate. That is, the between-model moves modify each 
coordinate separately. Given a current state X n = (m, x) , the algorithm then 
proceeds as follows. First, it replaces the coordinates d m + 1, d m + 2, . . . by 
fresh i.i.d. draws from the Uniform[0, 1] distribution. (Of course, in practice, 
we only need to generate such Uniform [0, 1] draws when they are required. 
But from a theoretical perspective, it is simplest to pretend they are up- 
dated at each iteration.) Then, with probability a, it proposes a between- 
model move as above, otherwise, with probability 1 — a, it chooses one of 
the coordinates 1,2,..., d m uniformly at random and executes a Metropolis- 
within-Gibbs move for that coordinate only. 

Theorem 21. Consider a (^-irreducible, trans- dimensional MCMC chain 
which is coordinate-preserving as in (3). Suppose that for each (m,x) G X, 
when the chain starts at Xq = (m,x), then, with probability 1, it eventually 
accepts at least one move in each of the coordinate directions 1,2, . . . ,d m . 
Then the chain is Harris recurrent. 
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Proof. The proof is analogous to that of Theorem 12. The only dif- 
ference is that we do not require a move to be accepted in the coordinate 
directions d m + 1, d m + 2, . . . , nor in the direction corresponding to the model 
index m. To justify this, note that since M is countable with p(m) > for 
all m € M, every distribution on Ai is absolutely continuous. Also, when 
starting from Xq = (m, x), coordinates d m + 1, d m + 2, . . . are drawn from 
an absolutely continuous (Uniform) distribution automatically. So, in the 
context of Theorem 12, this is equivalent to having already moved in the co- 
ordinate directions d m + 1, d m + 2, . . . and in the direction of A4. Then, just 
as in Theorem 12, the chain will eventually leave any set of zero stationary 
measure. The result thus follows from Theorem 6(vi). □ 

This leads to the question of Harris recurrence for trans-dimensional 
chains which are coordinate-mixing, that is, not coordinate-preserving. Un- 
fortunately, this situation is more complicated due to lack of control over the 
composition of the hij functions. For manageability, call a trans-dimensional 
chain dimension- controlling if hij is the identity on coordinate directions 
£ > m&x(di,dj), that is, if hij does not mix in any more dimensions than are 
necessary for dimension-matching. Now, it seems that coordinate-mixing in 
the hij should only help the chain to avoid null sets. However, the difficulty 
is that the between-model moves could, for example, "swap" the values of 
two coordinates so that updating each coordinate position once could cor- 
respond to updating one value twice and the other value not at all. Thus, 
the situation is unclear and we state this as an open problem: 

Open problem 2. Consider a ^-irreducible, coordinate-mixing, dimen- 
sion-controlling trans-dimensional MCMC chain, as above. Suppose that for 
each (m,x) 6 X, when the chain starts at Xq = (m,x), then, with proba- 
bility 1, it eventually accepts at least one move in each of the coordinate 
directions 1,2,..., d m . Does this imply that the chain is Harris recurrent? 

A positive answer to this question would show that general trans-dimen- 
sional chains, like other Metropolis- within- Gibbs chains, are Harris recurrent 
provided they eventually move at least once in each coordinate direction with 
probability 1. 
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